Skip to content

Add vLLM multi-instance CPU benchmark skill#81

Open
rjauhari2 wants to merge 1 commit into
amd:mainfrom
rjauhari2:add-vllm-multiinstance-skill
Open

Add vLLM multi-instance CPU benchmark skill#81
rjauhari2 wants to merge 1 commit into
amd:mainfrom
rjauhari2:add-vllm-multiinstance-skill

Conversation

@rjauhari2

Copy link
Copy Markdown

What

Adds vllm-multiinstance: benchmarks a vLLM CPU image on AMD EPYC by running N vLLM instances (each pinned to 32 physical cores) behind an NGINX load balancer, driving load with guidellm via ansible, and reporting peak aggregate memory (podman stats) plus end-to-end throughput/latency across models, concurrency rates, and instance counts. The benchmark harness is vendored with the skill; only podman + ansible are required.

It is robust on leaner hosts:

  • A host preflight (check-host.sh) fails fast with actionable guidance on unresolvable image short-names, missing rootless cgroup cpuset delegation, and CNI cniVersion mismatch.
  • start.sh auto-downgrades the CNI conflist, guards static-IP fallback, and fast-fails dead containers instead of hanging the health wait.
  • run_sweep.sh auto-detects missing passwordless sudo and runs ansible/guidellm rootless (ansible_become=false).

Contents

  • skills/vllm-multiinstance/: SKILL.md, skill-card.md, README.md, reference.md, vendored harness/, and scripts/.
  • Registered in .claude-plugin/marketplace.json (and regenerated .cursor-plugin/marketplace.json).
  • Behavioral eval at eval/behavioral/tests/test_vllm_multiinstance.py.

Testing

  • Structural gate (.github/scripts/check.sh) passes with 0 errors and reports the Cursor marketplace manifest is up to date.
  • Behavioral eval (LLM-judged, sonnet) 5/5 passed -- covers sweep sizing, the guidellm.log-vs-benchmarks.json score rule, host-preflight fail-fast, image short-name remediation, and rootless/become auto-detection.

Made with Cursor

Adds `vllm-multiinstance`: benchmarks a vLLM CPU image on AMD EPYC by
running N vLLM instances (each pinned to 32 physical cores) behind an
NGINX load balancer, driving load with guidellm via ansible, and
reporting peak aggregate memory (podman stats) + end-to-end
throughput/latency across models, concurrency rates, and instance
counts. The benchmark harness is vendored with the skill; only podman +
ansible are required.

Robust on leaner hosts: a host preflight (check-host.sh) fails fast with
actionable guidance on unresolvable image short-names, missing rootless
cgroup cpuset delegation, and CNI cniVersion mismatch; start.sh
auto-downgrades the CNI conflist, guards static-IP fallback, and
fast-fails dead containers instead of hanging the health wait;
run_sweep.sh auto-detects missing passwordless sudo and runs
ansible/guidellm rootless (ansible_become=false).

Contents: SKILL.md, skill-card.md, README.md, reference.md, vendored
harness/, scripts/; registered in .claude-plugin/marketplace.json
(+ regenerated .cursor-plugin manifest); behavioral eval at
eval/behavioral/tests/test_vllm_multiinstance.py.

Testing: structural gate (check.sh) passes with 0 errors; behavioral
eval (LLM-judged, sonnet) 5/5 passed -- covers sweep sizing, the
guidellm.log-vs-benchmarks.json score rule, host-preflight fail-fast,
image short-name remediation, and rootless/become auto-detection.

Signed-off-by: Rahul Jauhari <rahul.jauhari@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Change-Id: Ifa419ea79793fcfb303b6f1cc657539b22622f8b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant